Search CORE

105 research outputs found

Stochastic phonological grammars and acceptability

Author: Coleman John
Pierrehumbert Janet
Publication venue
Publication date: 01/01/1997
Field of study

In foundational works of generative phonology it is claimed that subjects can reliably discriminate between possible but non-occurring words and words that could not be English. In this paper we examine the use of a probabilistic phonological parser for words to model experimentally-obtained judgements of the acceptability of a set of nonsense words. We compared various methods of scoring the goodness of the parse as a predictor of acceptability. We found that the probability of the worst part is not the best score of acceptability, indicating that classical generative phonology and Optimality Theory miss an important fact, as these approaches do not recognise a mechanism by which the frequency of well-formed parts may ameliorate the unacceptability of low-frequency parts. We argue that probabilistic generative grammars are demonstrably a more psychologically realistic model of phonological competence than standard generative phonology or Optimality Theory.Comment: compressed postscript, 8 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Recommended from our members

Dissimilarity in the Arabic Verbal Roots

Author: Pierrehumbert Janet
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2020
Field of study

ScholarWorks@UMass Amherst

Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

Author: Lorge Isabelle
Pierrehumbert Janet
Publication venue
Publication date: 22/10/2023
Field of study

Vector space models of word meaning all share the assumption that words occurring in similar contexts have similar meanings. In such models, words that are similar in their topical associations but differ in their logical force tend to emerge as semantically close, creating well-known challenges for NLP applications that involve logical reasoning. Modern pretrained language models, such as BERT, RoBERTa and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings. However, reports are mixed about their success. In the current paper, we advance this discussion through a systematic study of scalar adverbs, an under-explored class of words with strong logical force. Using three different tasks, involving both naturalistic social media data and constructed examples, we investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these common words. We ask: 1) Do the models distinguish amongst the three semantic categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit representations of full scales from maximally negative to maximally positive? 3) How do word frequency and contextual factors impact model performance? We find that despite capturing some aspects of logical meaning, the models fall far short of human performance.Comment: Published in BlackBoxNLP workshop, EMNLP 202

arXiv.org e-Print Archive

Recommended from our members

The Meaning of Intonational Contours in the Interpretation of Discourse

Author: Hirschberg Julia Bell
Pierrehumbert Janet
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1990
Field of study

Recent investigations of the contribution that intonation makes to overall utterance and discourse interpretation promise new sources of information for the investigation of long-time concerns in NLP. In Hirschberg & Pierrehumber 1986 we proposed that intonational features such as phrasing, accent placement, pitch range, and tune represent important sources of information about the attentional and intentional structures of discourse. In this paper we examine the particular contribution of choice of tune, or intonational contour, to discourse interpretation

Columbia University Academic Commons

DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

Author: Hofmann Valentin
Pierrehumbert Janet
Schütze Hinrich
Publication venue
Publication date: 01/01/2020
Field of study

Can pretrained language models (PLMs) generate derivationally complex words? We present the first study investigating this question, taking BERT as the example PLM. We examine BERT's derivational capabilities in different settings, ranging from using the unmodified pretrained model to full finetuning. Our best model, DagoBERT (Derivationally and generatively optimized BERT), clearly outperforms the previous state of the art in derivation generation (DG). Furthermore, our experiments show that the input segmentation crucially impacts BERT's derivational knowledge, suggesting that the performance of PLMs could be further improved if a morphologically informed vocabulary of units were used

arXiv.org e-Print Archive

Crossref

Open Access LMU

Oxford University Research Archive

Predicting the Growth of Morphological Families from Social and Linguistic Factors

Author: Hofmann Valentin
Pierrehumbert Janet
Schütze Hinrich
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2020
Field of study

We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Predic- tion (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good performance on MFEP

Crossref

Open Access LMU

Oxford University Research Archive

The Reddit Politosphere: A Large-Scale Text and NetworkResource of Online Political Discourse

Author: Hofmann Valentin
Pierrehumbert Janet
Schütze Hinrich
Publication venue
Publication date: 31/05/2022
Field of study

We introduce the Reddit Politosphere, a large-scale resource of online political discourse covering more than 600 political discussion groups over a period of 12 years. It is to the best of our knowledge the largest and ideologically most comprehensive dataset of its type now available. One key feature of the Reddit Politosphere is that it consists of both text and network data, allowing for methodologically-diverse analyses. We describe in detail how we create the Reddit Politosphere, present descriptive statistics, and sketch potential directions for future research based on the resource

Open Access LMU

Oxford University Research Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Graph Auto-encoder Model of Derivational Morphology

Author: Hofmann Valentin
Pierrehumbert Janet
Schütze Hinrich
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2020
Field of study

There has been little work on modeling the morphological well-formedness (MWF) of derivatives, a problem judged to be complex and difficult in linguistics (Bauer, 2019). We present a graph auto-encoder that learns em- beddings capturing information about the com- patibility of affixes and stems in derivation. The auto-encoder models MWF in English sur- prisingly well by combining syntactic and se- mantic information with associative informa- tion from the mental lexicon

Crossref

Open Access LMU

Oxford University Research Archive